SMALLWorlds - Multilingual Content-Controlled Monologues

نویسندگان

  • Peter Juel Henrichsen
  • Marcus Uneson
چکیده

We present the speech corpus SMALLWorlds (Spoken Multilingual Accounts of Logically Limited Worlds), newly established and still growing. SMALLWorlds contains monologic descriptions of scenes or worlds which are simple enough to be formally describable. The descriptions are instances of content-controlled monologue: semantically “pre-specified” but still bearing most hallmarks of spontaneous speech (hesitations and filled pauses, relaxed syntax, repetitions, self-corrections, incomplete constituents, irrelevant or redundant information, etc.) as well as idiosyncratic speaker traits. In the paper, we discuss the pros and cons of data so elicited. Following that, we present a typical SMALLWorlds task: the description of a simple drawing with differently coloured circles, squares, and triangles, with no hints given as to which description strategy or language style to use. We conclude with an example on how SMALLWorlds may be used: unsupervised lexical learning from phonetic transcription. At the time of writing, SMALLWorlds consists of more than 250 recordings in a wide range of typologically diverse languages from many parts of the world, some unwritten and endangered.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Interactive Multilingual Web Applications with Grammatical Framework

We present an approach to multilingual web content based on multilingual grammars and syntax editing for a controlled language. Content can be edited in any supported language and it is automatically kept within a controlled language fragment. We have implemented a web-based syntax editor for Grammatical Framework (GF) grammars which allows both direct abstract syntax tree manipulation and text...

متن کامل

A Multilingual Semantic Wiki Based on Attempto Controlled English and Grammatical Framework

We describe a semantic wiki system with an underlying controlled natural language grammar implemented in Grammatical Framework (GF). The grammar restricts the wiki content to a well-defined subset of Attempto Controlled English (ACE), and facilitates a precise bidirectional automatic translation between ACE and language fragments of a number of other natural languages, making the wiki content a...

متن کامل

Multi-language Machine Translation through Interactive Document Normalization

Document normalization is an interactive process that transforms raw legacy documents into semantically well-formed and linguistically controlled documents with the same communicative intention content. A paradigm for content analysis has been implemented to select candidate semantic representations of the communicative content of an input document. This implementation reuses the formal content...

متن کامل

SmallWorlds: Visualizing Social Recommendations

We present SmallWorlds, a visual interactive graph-based interface that allows users to specify, refine and build item-preference profiles in a variety of domains. The interface facilitates expressions of taste through simple graph interactions and these preferences are used to compute personalized, fully transparent item recommendations for a target user. Predictions are based on a collaborati...

متن کامل

General Architecture of a Controlled Natural Language Based Multilingual Semantic Wiki

In this paper we propose the components, the general architecture and application areas for a controlled natural language based multilingual semantic wiki. Such a wiki is a collaborative knowledge engineering environment that makes its content available via multiple languages, both natural and formal, all of them synchronized via their abstract form that is assigned by a shared grammar. We also...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012